arXiv:1507.01768v2 [cs.DS] 13 Oct 2015 


The Restricted Isometry Property of Subsampled Fourier 

Matrices 

Ishay Haviv* Oded Regev^ 


Abstract 

A matrix A 6 satisfies the restricted isometry property of order k with constant e if it 

preserves the ^2 norm of all fc-sparse vectors up to a factor of 1 ± e. We prove that a matrix 
A obtained by randomly sampling q = 0{k ■ log^ k ■ log N) rows from an N x N Fourier ma¬ 
trix satisfies the restricted isometry property of order k with a fixed e with high probability. 
This improves on Rudelson and Vershynin (Comm. Pure Appl. Math., 2008), its subsequent 
improvements, and Bourgain (GAFA Seminar Notes, 2014). 


1 Introduction 

A matrix A G satisfies the restricted isometry property of order k with constant e > 0 if for 

every fc-sparse vector x G (i.e., a vector with at most k nonzero entries), it holds that 

(1-e) ■ ll^lli < \\^x\\l < (1 + e) ■ ||x||^ . (1) 

Intuitively, this means that every k columns of A are nearly orthogonal. This notion, due to Candes 
and Tao IITlI , was intensively studied during the last decade and found various applications and 
connections to several areas of theoretical computer science, including sparse recovery [8j|20H271, 
coding theory [[141 , norm embeddings ||6H23|, and computational complexity ll4l lMll25ll . 

The original motivation for the restricted isometry property comes from the area of com¬ 
pressed sensing. There, one wishes to compress a high-dimensional sparse vector V G to a 
vector Ax, where A G C'^^^isa measurement matrix that enables reconstruction of x from Ax. 
Typical goals in this context include minimizing the number of measurements q and the running 
time of the reconstruction algorithm. It is known that the restricted isometry property of A, for 
£ < Vl — 1, is a sufficient condition for reconstruction. In fact, it was shown in |[TT1 [1313 El that 
under this condition, reconstruction is equivalent to finding the vector of least £-[ norm among 
all vectors that agree with the given measurements, a task that can be formulated as a linear pro¬ 
gram lfT3IT^ . and thus can be solved efficiently. 
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The above application leads to the challenge of finding matrices A G that satisfy the re¬ 

stricted isometry property and have a small number of rows q as a function of N and k. (For sim¬ 
plicity, we ignore for now the dependence on e.) A general lower bound oi q = Cl{k ■ log(N/fc)) 
is known to follow from IfT^ (see also KlTl ). Fortunately, there are matrices that match this lower 
bound, e.g., random matrices whose entries are chosen independently according to the normal 
distribution [12]. Flowever, in many applications the measurement matrix cannot be chosen ar¬ 
bitrarily but is instead given by a random sample of rows from a unitary matrix, typically the 
discrete Fourier transform. This includes, for instance, various tests and experiments in medicine 
and biology (e.g., MRI [28] and ultrasound imaging [211 1 and applications in astronomy (e.g., 
radio telescopes [32l l. An advantage of subsampled Fourier matrices is that they support fast 
matrix-vector multiplication, and as such, are useful for efficient compression as well as for effi¬ 
cient reconstruction based on iterative methods (see, e.g., 12^ 1. 

In recent years, with motivation from both theory and practice, an intensive line of research 
has aimed to study the restricted isometry property of random sub-matrices of unitary matrices. 
Letting A G be a (normalized) matrix whose rows are chosen uniformly and independently 
from the rows of a unitary matrix M G the goal is to prove an upper bound on q for which 

A is guaranteed to satisfy the restricted isometry property with high probability. Note that the fact 
that the entries of every row of A are not independent makes this question much more difficult 
than in the case of random matrices with independent entries. 

The first upper bound on the number of rows of a subsampled Fourier matrix that satisfies 
the restricted isometry property was 0{k ■ log^ N), which was proved by Candes and Tao 1121 . 
This was then improved by Rudelson and Vershynin IMl to 0{k ■ log^ k ■ log(/:log N) ■ log N) (see 
also |[2^ ITS] for a simplified analysis with better success probability). A modification of their 
analysis led to an improved bound of 0{k ■ log^ k ■ logN) by Cheraghchi, Guruswami, and Vel- 
ingker ITII . who related the problem to a question on the list-decoding rate of random linear 
codes over finite fields. Interesti^ly, replacing the log(klog N) term in the bound of [30] by logk 
was crucial for their application!^ Recently, Bourgain [7] proved a bound of 0(A: ■ logk ■ log^ N), 
which is incomparable to those of Il30l IMl (and has a worse dependence on e; see below). We 
finally mention that the best known lower bound on the number of rows is n(A: ■ log N) [5|. 

1.1 Our Contribution 

In this work, we improve the previous bounds and prove the following. 

Theorem 1.1 (Simplified). Let M G be a unitary matrix with entries of absolute value 0(1/'/N), 

and let e > 0 be a fixed constant. For some q = 0{k ■ log^ k ■ logN), let A G be a matrix whose q 

rows are chosen uniformly and independently from the rows of M, multiplied by y^N/q. Then, with high 
probability, the matrix A satisfies the restricted isometry property of order k with constant e. 

The main idea in our proof is described in Section [L3l We arrived at the proof from our recent 
work on list-decoding 119] . where a baby version of the idea was used to bound the sample com- 

^Note that the list-decoding result of flTI was later improved by Wootters l33l using different techniques. 
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plexity of learning the class of Fourier-sparse Boolean functionsa Like all previous work on this 
question, our proof can be seen as a careful union bound applied to a sequence of progressively 
finer nets, a technique sometimes known as chaining. However, unlike the work of Rudelson 
and Vershynin ISOll and its improvements HU [Tsl, we avoid the use of Gaussian processes, the 
"symmetrization process," and Dudley's inequality. Instead, and more in line with Bourgain's 
proof f7], we apply the chaining argument directly to the problem at hand using only elementary 
arguments. It would be interesting to see if our proof can be cast in the Gaussian framework of 
Rudelson and Vershynin. 

We remark that the bounds obtained in the previous works Il30l 113 have a multiplicative 
0(e^^) term, where a much worse term of was obtained in [7]. In our proof of Theo¬ 

rem [id] we nearly obtain the best known dependence on e. For simplicity of presentation we first 
prove in SectionjBjour bound with a weaker multiplicative term of 0(e^^), and then, in SectionjU 
we modify the analysis and decrease the dependence on e to 0 (e^^). 

1.2 Related Literature 

As mentioned before, one important advantage of using subsampled Fourier matrices in com¬ 
pressed sensing is that they support fast, in fact nearly linear time, matrix-vector multiplication. 
In certain scenarios, however, one is not restricted to using subsampled Fourier matrices as the 
measurement matrix. The question then is whether one can decrease the number of rows us¬ 
ing another measurement matrix, while still keeping the near-linear multiplication time. For 
k < where 7 > 0 is an arbitrary constant, the answer is yes: a construction with the 

optimal number 0{k ■ log N) of rows follows from works by Ailon and Chazelle 111 and Ailon and 
Liberty 121 (see |5l). For general k, Nelson, Price, and Wootters Il27il suggested taking subsampled 
Fourier matrices and "tweaking" them by bunching together rows with random signs. Using the 
Gaussian-process-based analysis of Il30l HU and introducing further techniques from Il22il . they 
showed that with this construction one can reduce the number of rows by a logarithmic factor 
to 0{k • log^(A:log N) ■ log N) while still keeping the nearly linear multiplication time. Our result 
shows that the same number of rows (in fact, a slightly smaller number) can be achieved already 
with the original subsampled Fourier matrices without having to use the "tweak." A natural open 
question is whether the "tweak" from |[27 l and their techniques can be combined with ours to 
further reduce the number of rows. An improvement in the regime of parameters ofk = co{\/N) 
would lead to more efficient low-dimensional embeddings based on Johnson-Lindenstrauss ma¬ 
trices (see, e.g., ini2l|23ll3ll27|). 

2 The result in (Tgl is weaker in two main respects. First, it is restricted to the case that Ax is in {0,1}"?. This 
significantly simplifies the analysis and leads to a better bound on the number of rows of A. Second, the order of 
quantifiers is switched, namely it shows that for any sparse x, a random subsampled A works with high probability, 
whereas for the restricted isometry property we need to show that a random A works for all sparse x. 
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1.3 Proof Overview 

Recall from Theorem 11.11 and from ([T]) that our goal is to prove that a matrix A given by a random 
sample Q of q rows of M satisfies with high probability that for all /c-sparse a:, ||Av|| 2 ~ ll^lll- Since 
M is unitary, the latter is equivalent to saying that 11 ^3:11 2 ~ 11 Mx 11 2 . Yet another way of expressing 
this condition is as 

E [{\Mx\^)j] « E [i\Mx\^)j] , 

;6Q jelN] 

i.e., that a sample Q C [N] of q coordinates of the vector |Mvp gives a good approximation to the 
average of all its coordinates. Here, |Mvp refers to the vector obtained by taking the squared ab¬ 
solute value of Mx coordinate-wise. For reasons that will become clear soon, it will be convenient 
to assume without loss of generality that ||x||i = 1. With this scaling, the sparsity assumption 
implies that ||Mv ||2 is not too small (namely at least l/Zc), and this will determine the amount of 
additive error we can afford in the approximation above. This is the only way we use the sparsity 
assumption. 

At a high level, the proof proceeds by defining a finite set of vectors T-L that forms a net, i.e., 
a set satisfying that any vector |Mxp is close to one of the vectors in T-L. We then argue using 
the Chernoff-Hoeffding bound that for any fixed vector h ^ H, a sample of q coordinates gives a 
good approximation to the average of h. Finally, we complete the proof by a union bound over all 

hen. 

In order to define the set n we notice that since ||v||i = 1, Mx can be seen as a weighted 
average of the columns of M (possibly with signs). In other words, we can think of Mx as the 
expectation of a vector-valued random variable given by a certain probability distribution over the 
columns of M. Using the Chernoff-Hoeffding bound again, this implies that we can approximate 
Mx well by taking the average over a small number of samples from this distribution. We then 
let n be the set of all possible such averages, and a bound on the cardinality of % follows easily 
(basically N raised to the number of samples). This technique is sometimes referred to as Maurey's 
empirical method. 

The argument above is actually oversimplified, and carrying it out leads to rather bad bounds 
on q. As a result, our proof in Section|3]is slightly more delicate. Namely, instead of just one set n, 
we have a sequence of sets, "Hi, n2, ■ ■ ■, each being responsible for approximating a different scale 
of |Mx|^. The first set "Hi approximates |Mx|^ on coordinates on which its value is highest; since 
the value is high, we need less samples in order to approximate it well, as a result of which the set 
"Hi is small. The next set "^2 approximates |Mxp on coordinates on which its value is somewhat 
smaller, and is therefore a bigger set, and so on and so forth. The end result is that any vector 
|Mxp can be approximately decomposed into a sum with //(') G "H,. To complete the proof, 

we argue that a random choice of q coordinates approximates all the vectors in all the "H, well. 
The reason working with several "H, leads to the better bound stated in Theorem 1 1.1 1 is this: even 
though as i increases the number of vectors in grows, the quality of approximation that we 
need the q coordinates to provide decreases, since the value of |Mxp there is small and so errors 
are less significant. It turns out that these two requirements on q balance each other perfectly, 
leading to the desired bound on q. 
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2 Preliminaries 

Notation. The notation x y means that x G [(1 — e)y — a, (1 + e)y + a]. For a matrix M, we 
denote by the £th column of M and define ||M||oo = max^ |My |. 

The Restricted Isometry Property. The restricted isometry property is defined as follows. 

Definition 2.1. We say that a matrix A G satisfies the restricted isometry property of order k 

with constant e if for every k-sparse vector x G it holds that 

(1-e) • Ml < WMll < (1 + e) • Mil- 

Chernoff-Hoeffding Bounds. We now state the Chernoff-Hoeffding bound (see, e.g., IMl) arid 
derive several simple corollaries that will be used extensively later. 

Theorem 2.2. Let Xi,..., Xjv be N identically distributed independent random variables in [0, a] satisfy¬ 
ing E[X,] = pfor all i, and denote X = ■ Y 4 L 1 X,. Then there exists a universal constant C such that 

for every 0 < e < 1/2, the probability that X h Is at least 1 — 

Corollary 2.3. Let Xi,.. .,X]v be N identically distributed independent random variables in [0,fl] satis¬ 
fying ^[Xi] = yfor all i, and denote X = ^ ■ X,. Then there exists a universal constant C such that 

for every 0 < e < 1/2 and a > 0, the probability that X P is at least 1 — 

Proof: If y > I then by Theorem |Z2] the probability that X P is at least 1 — ^, which 

is at least 1 — Otherwise, Theorem 12.21 for e = ^ > e implies that the probability that 

X «£^o P, hence X p, is at least 1 — and the latter is at least 1 — 2e~‘~'^^^^‘’. ■ 

Corollary 2.4. Let Xi,...,Xn be N identically distributed independent random variables in [—a,-\-a] 
satisfying ]E[Xi] = y and E[|Xj|] = yfor all i, and denote X = ^ • Xj. Then there exists a universal 
constant C such that for every 0 < e' < 1/2 and a > 0, the probability that X ~o,£' f;+a P is least 

I _ ^^-C-Nae'/a 

Proof: The corollary follows by applying Corollary 12.31 to max(X;, 0) and to — min(X;, 0). ■ 

We end with the additive form of the bound, followed by an easy extension to the complex 
case. 

Corollary 2.5. Let Xi,...,X]v be N identically distributed independent random variables in [—a,-\-a\ 
satisfying E[X,] = yfor all i, and denote X = ^ ■ X;. Then there exists a universal constant C such 

that for every b > 0, the probability that X y is at least 1 — 


5 


Proof: We can assume that b < 2a. The corollary follows by applying Corollary 12.41 to. say, a. — 
3fo/4 and e'= fc/(4fl). ■ 

Corollary 2.6. Let Xi,..., be N identically distributed independent complex-valued random variables 
satisfying |X;| < a and E[X;] = yfor all i, and denote X = ^ • X;. Then there exists a universal 

constant C such that for every b > 0, the probability that |X| \pi\ is at least 1 — . 

Proof: By Corollary I2.5l applied to the real and imaginary parts of the random variables Xi,..., X^ 
it follows that for a universal constant C, the probability that Re(X) arid lm(X) 

lm(f/) is at least 1 — By triangle inequality, it follows that with such probability we 

have |X| «o,b If|/ as required. ■ 

3 The Simpler Analysis 

In this section we prove our result with a multiplicative term of in the bound. We start 

with the following theorem. 

Theorem 3.1. For a sufficiently large N, a matrix M G C^^^, and sufficiently small e,r] > 0, the follow¬ 
ing holds. For some q = 0{e^^rj^^ logN • \o^{\/rj)), let Qbe a multiset of q uniform and independent 
random elements of [N]. Then, with probability 1 — ■iogN-iog(i/j/))^ holds that for every x G C^, 

Throughout the proof we assume without loss of generality that the matrix M G C^^^ satisfies 
||M||oo = 1- Fore,)/ > 0, we denote f = log 2 (l/) 7 ), r = log 2 (l/e^), and 7 = q/{2t). We start by 
defining several vector sets as follows. 

The Vector Sets Gi. For every 1 < / < f + r, let Gi denote the set of all vectors g^''l G C^ that can 
be represented as 

= E (-!)'"■ Ml') (2) 

II (Gs)eF 

for a multiset f of 0(2' ■log(l/ 7 )) pairs in [N] x {0,1,2,3}. A trivial counting argument gives the 
following. 

Claim 3.2. For every 1 < ) < f+ r, \Gi\ < _ 

The Vector Sets "H;. For a f-tuple of vectors S Gi+r x • • • x Gt+r and for 1 < 

i < t, let Bi be the set of all j G [N] for which i is the smallest index satisfying 1^2- 2^'^^. 
For such i, define the vector by 

hf =mm{\gf^^^\^-9-2-^). (3) 

Let FLi be the set of all vectors that can be obtained in this way. 
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Claim 3.3. For every 1 < i < t, |'H;| < ^•2'-iog(i/7))_ 

Proof: Observe that every G FLi is fully defined by some S Gi+r x ■ • • x 

Gi+r- Hence 

m < \Gl+r\ • ■ • \Gi+r\ < . 

Using the definition of r, the claim follows. ■ 

Lemma 3.4. For every fj > 0 and some q = logN ■ log(l/ 7 )), let Qbea multiset ofq uniform 

and independent random elements of [N]. Then, with probability 1 — •iogN-iog(i/ 7 ))^ jj. j^gi^g 

for alll < i < t and h^l g %{, 



Proof: Fix an 1 < i < t and a vector h^l g FLi, and denote p = ^je[N][^f'^]- Corollary I2.3[ 
applied with a = fj and a = 9 ■ 2“' (recall that h^’'^ < a for every j), with probability 1 — 
it holds that E/eq P- Using Claim 1331 the union bound over all the vectors in FLi implies 

that the probability that some //(') G FLi does not satisfy Ey6Q[ljj*^] ^e,fj p is at most 

y/0(£^^-2'-log(l/7)) < 2-^(‘=^^-2‘-logN-log(l/7)) _ 

We complete the proof by a union bound over i. ■ 

Approximating the Vectors Mx. 

Lemma 3.5. For every vector x G with ||x||i = 1, every multiset Q C [N], and every 1 < i < t F r, 
there exists a vector g E Gi that satisfies |(Mx)y| arip 2 - 1/2 \gj\for all but at most ^fraction of j G [N] and 
for all but at most j fraction ofj G Q. 

Proof: Observe that for every £ G [N] there exist pi^, pi^i, pi^ 2 f Pe ,3 > 0 that satisfy 

3 _ 3 

J2pe,s = \xe\ and Vl-= xe- 

s=0 s=0 

Notice that the assumption ||x||i = 1 implies that the numbers pi^g form a probability distribution. 
Thus, the vector Mx can be represented as 

N N 3 

Mx = - E [V2-{-lY'^-M^% 

i=l 1=1 s=0 {Fs)~D 

where D is the distribution that assigns probability p^^g to the pair {£,s). 


7 


Let F be a multiset of 0(2' • log(l/ 7 )) independent random samples from D, and let 
be the vector corresponding to F as in (|2]). By Corollary 12.61 applied with a = \/2 (recall that 
||M||oo = 1) and h = 2^'^^, for every; G [N] the probability that 

|(Mx);| ~o, 2-‘/2 \gj\ (4) 

is at least 1 — 7 / 4 . It follows that the expected number of; G [N] that do not satisfy (11]| is at most 
7 N /4, so by Markov's inequality the probability that the number of ; G [N] that do not satisfy (HJ 
is at most 7 N is at least 3/4. Similarly, the expected number of ; G Q that do not satisfy (|1]| is at 
most 7 I Q| /4, so by Markov's inequality, with probability at least 3/4 it holds that the number of 
; G Q that do not satisfy (HI) is at most 7 | Q|. It follows that there exists a vector g G Gi for which (H)) 
holds for all but at most 7 fraction of; G [N] and for all but at most 7 fraction of; G Q, as required. 


Lemma 3.6. For every multiset Q C [N] and every vector x G with ||x||i = 1 there exists a t-tuple of 
vectors {h^^f... G Ffi x • ■ ■ x Fit for which 

1. E;6q[|(Mx);|2] «0(40(^)E;6q[EU^^'’^] 

2 . E;g[N] [|(Mv);p] ~ 0 (£), 0 (; 7 ) E; 6 []v] [E;=l 

Proof: By Lemma 1331 for every 1 < / < f there exists a vector G Gi^,- that satisfies 

\iMx)j\ «0 2 -(-+o /2 (5) 

for all but at most 7 fraction of ; G [N] and for all but at most 7 fraction of j G Q. We say that 
; G [N] is good if lO holds for every 1 < i < t, and otherwise that it is bad. Notice that all but at 
most 17 fraction of; G [N] are good and that all but at most t^ fraction of ; G Q are good. Let 
{h^^\ .. .,h^^'>) and (Bi,..., Bf) be the vectors and sets associated with as defined 

in (|3l). We claim that h^^f ..., h^^'i satisfy the requirements of the lemma. 

We first show that for every good ; it holds that |(Mx);p ^ 3 e, 9 tj Ei=i^j'^- To obtain it, we 
observe that if ; G Bi for some i, then 

2 • 2“'/2 < I < 3 • 2“'T2_ (6) 

The lower bound follows simply from the definition of Bi. For the upper bound, which trivially 
holds for i = 1 , assume that i > 2 , and notice that the definition of Bi implies that 
2 .Using ©, and assuming that e is sufficiently small, we obtain that 

< I(Mx);I +2“+ 

< + 2^/2 . £ 7 e) < 3.2-F2_ 

Hence, by the upper bound in (| 6 ]), for a good; G Bj we have /ij'^ = ^ and ti"- ^ = 0 for i! 7 ^ i. 

Observe that by the lower bound in ©, 

|(M7| E ||gL'U2‘‘'-"''’GlgL'’l+2-''+''>7 C |(1-£).|7'I|,(1 + £).|7'I|], 
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and that this implies that |(Mx)jp ~ 3 £,o On the other hand, in case that j is good but 

does not belong to any B/, recalling that t = log 2 (l/j/), it follows that 

and thus \ {Mx)j\'^ ^o, 9 fj 0 = Yj=i 
Finally, for every bad j we have 





i=l 


< max 



< 2. 


Since at most t'y fraction of fhe elements in [N] and in Q are bad, their effect on the difference 
between the expectations in the lemma can be bounded by Ity. By our choice of 7 , this is i], 
completing the proof of the lemma. ■ 


Finally, we are ready to prove Theorem l3.ll 


Proof of Theorem l3.ll By Lemmaapplied with fj = t]/ (2t), a random multiset Q of size 


q — Ole ^ ■ t ■ log N ■ lo 


= Ole ^rj ^ log N • log^ 


satisfies with probability 1 — 2 ^■logN log(i/! 7 )) all 1 < z < f and G Tii, 


E\hf] \hf’] , 


(Oi 


;6Q 


je[N] 


in which case we also have 


E 


EM 


(0 


^ t 




E 


je[N] 


E^ 


(0 


We show that a Q with the above property satisfies the requirement of the theorem. Let x G C 
be a vector, and assume without loss of generality that ||x||i = 1. By Lemma [TH there exists a t- 
tuple of vectors {h^^\ ..., ) G "Hi x • • ■ x Tf t satisfying Items 1 and 2 there. As a result, 

E [\{Mx)j\^] ^o{e), 0 {v) E [l(Mx)y| 2 ] , 

;6Q je[N] 


N 


and we are done. 


3.1 The Restricted Isometry Property 

Equipped with Theorem 13.11 it is easy to derive our result on the restricted isometry property (see 
Definition 12.Ill of random sub-matrices of unitary matrices. 

Theorems.?. For sufficiently large N and k, a unitary matrix M G satisfying ||M|loo < 0{1/'/N), 

and a sufficiently small e > 0, the following holds. For some q = 0(6^^ ■ k ■ log^{k/e) ■ logN), let 
A G be a matrix whose q rows are chosen uniformly and independently from the rows of M, mul¬ 

tiplied by ffN/q. Then, with probability 1 — 2^^(^ -logN-iogCfc/g))^ matrix A satisfies the restricted 
isometry property of order k with constant e. 
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Proof: Let Q be a multiset of q uniform and independent random elements of [N], defining a 
matrix A as above. Notice that by the Cauchy-Schwarz inequality, any fc-sparse vector x G 
with ||x ||2 = 1 satisfies ||x||i < Vk. Applying Theorem l3.ll with e/2 and some q — Cl{e/k), we get 
that with probability 1 — ^■logN log(/c/£))^ for every x G with ||x ||2 = 1, 

\\Ax \\1 = N- E[\{Mx)j\^]^,/2,e/2N- E [\{Mx)j\^] = \\Mx\\l = l. 
i&Q je[N] 

It follows that every vector x G satisfies ||Ax ||2 ~£,o ||^|| 2 r hence A satisfies the restricted 
isometry property of order k with constant e. ■ 


4 The Improved Analysis 

In this section we prove the following theorem, which improves the bound of Theorem 13.11 in 
terms of the dependence on e. 

Theorem 4.1. For a sufficiently large N, a matrix M G and sufficiently small e,rj > 0, the 

following holds. For some q = 0(log^(l/e) • logN • log^(l/;/)), let Qbe a multiset ofq uniform 

and independent random elements of [N]. Then, with probability 1 — it holds that for 

every x G C^, 

.E [|(Mx)y|2] «£,,.||.||2 ,||m||L .E^j[l(^^)/I"]- (7) 

We can assume that t> q, as otherwise, one can apply the theorem with parameters q/2,q/2 
and derive (O for e, q as well (because the right-hand size is bounded from above by ||x||^ ■ ||M||^). 
As before, we assume without loss of generality that ||M||oo — 1- For e > q > 0, we define t = 
log 2 (l/? 7 ) and r = log 2 (l/£^). For the analysis given in this section, we define j = q/(60(t + r)). 
Throughout the proof, we use the vector sets from Section |3] and Lemma lT5] for this value of 7 . 


The Vector Sets For a (t + r)-tuple of vectors S x • • • x Qt+r and for 

1 < f < f, let C; be the set of all j G [N] for which i is the smallest index satisfying |^j 0|>2.2-/2_ 
For m = i,... ,i + r define the vector /i/'’”) by 




•1 


i&Ci> 


( 8 ) 


and for other values of m define h^’-'^'l = 0. Now, for every m, let be the vector defined by 



0 , otherwise. 


(9) 


Note that the support of is contained in C,. Let Vi „^ be the set of all vectors A^'™) that can be 
obtained in this way. 

Claim 4.2. For every 1 < i < t and i < m < i + r, iViml < N‘^(2'"-iog(i/7)). 


10 


Proof: Observe that every vector in P; is fully defined by some ^ Gi x ■ ■ ■ x Gm- 

Hence 

< \Gl\ • • • \Gm\ < NO(l°g(l/^))-(2'+2"+-+2”') < AfO(log(l/7))-2-+i ^ 

and the claim follows. ■ 

Lemma 4.3. For every i,fj > 0 and some q — 0{e^^fj~^\ogN ■ log(l/ 7 )), let Q be a multiset of q 
uniform and independent random elements of[N]. Then, with probability 1 — n holds 

that for every 1 < i < t, m, and a vector G 2?,;^ associated with a set Q, 

E [A<"“'] »o,i. E [a!'-”'] for h = 0 (e . 2-' . §1 + 7 . (10) 

;6Q^ ^ je[Nf ^ ' V N J 

Proof: Fix i, m, and a vector G 1*,;^ associated with a set Ci as in ((9]|. Notice that 

E [|a5'''")|] < 30 • . M . 

jelNf ) - N 

By Corollary 12.41 applied with 

e'= g-2(’"-')/2, ci = ii, and a = 30 ■ 

we have that (flOl) holds with probability 1 — Using Claim the union bound over 

all the vectors in !?,■ ,„ implies that the probability that some G does not satisfy (fTOll is at 
most 

j^O( 2 ’«.log(l/ 7 )) < 2 -f^( 2 ’"-logW-log(l/ 7 )) _ 

The result follows by a union bound over i and m. ■ 

Approximating the Vectors Mx. 

Lemma 4.4. For every multiset Q C [N] and every vector x G C^ with ||x||i = 1 there exist vector 
collections (A^'''") G T’;>)m=i,...,!+r associated with sets Ci (1 < i < t),for which 

2. Eje[N][\iMx)j\^] 

2. EjeQ[\{Mx)j\^] «o( 40 (i;)EyeQ[LUU|:,Af'")],flnd 

3. E;e[N][l(Mx),-|2] «.o(UO(i;)lE;6[N][EUE|„+4-Af'”)]. 

Proof: By Lemma l33l for every 1 < / < f + r there exists a vector G Gi that satisfies 

\{Mx)j\^,^,-.n\gf\ ( 11 ) 

for all but at most 7 fraction of j G [N] and for all but at most 7 fraction of j G Q. We say that 
j G [N] is good if (fTTll holds for every i, and otherwise that it is bad. Notice that all but at most 
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{t + r)'Y fraction of j G [N] are good and that all but at most (t + r )7 fraction of ; G Q are good. 
Consider the sets C, and vectors associated with as defined in ©.We 

claim that satisfy the requirements of the lemma. 

Fix some 1 < / < f. For every good j G C;, the definition of C; implies that |y| ')| > so 

using (|TT]) it follows that 

I (Mx)yl > I - ( 12 ) 

We also claim that \ {Mx )^.|<3.2-F-i)/2. 

This trivially holds for i = 1, so assume that i > 2, and 
notice that the definition of C; implies that jyj' | < 2 ■ 2 “('~h/ 2 ^ gQ using (fTTll . it follows that 

\{Mx)j\ < < 3 ■ (13) 

Since at most {t + r)'y fraction of j G [N] are bad, (IT^ yields that 

E [|(Mx)y|2] >^2 --M_(i + ,)^/2>^2 -'-M_^, 

;e[N] N N 

as required for Item[TJ 

Next, we claim that every good j satisfies 

I (Mx)y«o(,),o(,)E M 

For a good j G C; and m > i, 

||(Mv)y|2< 2-|(Mx)y| •2-'”/2 + 2-'" < 10-2“(’+’")/^ (15) 

where the first inequality follows from (fTTl) and the second from (fl^ . In particular, for m = i + r 
(recall that r = log 2 (l/e^)), we have 

\\{Mx)j\^ - < 10 ■ £ ■ 2~' < 10 • e ■ |(Mv)/ , 

and thus \ {Mx )j? -Oie),ohf''^''’ . Since every good j belongs to at most one of the sets C,, for every 

good; G U Ci we have | (Mx)yp ~o(£),o ■ On the other hand, if j is good but does not 

belong to any Ci, by our choice of t, it satisfies 

I {Mx)j\ < \gf^ I + 2-^/2 < 3 . 2-t/2 ^ 3 ^ ^ 

and thus | {Mx)j\^ ~o, 9 ?/ 0 = IL\=i ■ This establishes that ((141) holds for every good ;. 

Next, we claim that for every good j, 

t i+r 

\{Mx)j\^ ^ 0 {e), 0 (n)t.'L,^r ■ 
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This follows since for every 1 < i < t, the vector //('''+'■) can be written as the telescopic sum 

i+r 


where we used that = 0. We claim that for every good j, these differences satisfy 

< 30.2-('+'«)/2^ 

thus establishing that dT^ holds for every good ]. Indeed, for m>i + l, dlSl) implies that 


< 10 . 2-('+'”-b/2) < 30.2-('+'«)/2^ 

and for m = i it follows from dTTl) combined with dl^ . 

Finally, for every bad j we have 

t i+r 


(17) 


I “EE < 1 + 30 ■ max ^ 2 

i—l m=i - m=i 


i+r 


-{i+m)/2\ 


J <60. 


Since at most (f + r )7 fraction of the elements in [N] and in Q are bad, their effect on the difference 
between the expectations in Items |2] and |3] can be bounded by 60(f + r)j. By our choice of 7 this is 
rj, as required. ■ 

Finally, we are ready to prove Theorem l4.1[ 

Proof of Theorem l4.lt Recall that it can be assumed that e > //. By Lemma 14.31 applied with 
e = e/r and fj = rj / (rt), a random multiset Q of size 


q = ■ t ■ logN • lo; 

= 0 ^ 1 og^(l/£) ■ logN ■ log' 

satisfies with probability 1 — 2 0(logN-log(i/j/))^ £qj. every 1 < i < t, m, and g 17,7 

associated with a set C;, 


E [AfE [A)'-"'] for b,^0{--2 


{i,m)i 


jeQ 


je[N] 




E4,1 

N rt)’ 


in which case we also have 

r t i+r 


EEA 


E _ _ 

i^Q '■;■=! m= 


(i,m) 


r t i+r 


"-0,b E 




for fo = o(e -^2 ' • 


+ ■ 


i=l 


(18) 


E E A) 

i=lm=i 

We show that a Q with the above property satisfies the requirement of the theorem. Let x G 
be a vector, and assume without loss of generality that ||x||i = 1. By Lemma l4)4l there exist vector 
collections G T>i^m)m=i,...,i+r associated with sets C, (1 < f < t), satisfying Items 1, 2, and 3 

there. Combined with dlSll . this gives 

E [|(Mx)/] ^o{e),o(n) E [\{Mx)j\^] , 

;6Q je[N] 

and we are done. ■ 
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4.1 The Restricted Isometry Property 

It is easy to derive now the following theorem. The proof is essentially identical to that of Theo¬ 
rem [321 using Theorem |4T] instead of Theorem l3.ll 

Theorem 4.5. For sufficiently large N and k, a unitary matrix M E satisfying ||M||oo < 

and a sufficiently small e > 0, the following holds. For some q = 0(log^(l/e)£^^ • k ■ log^{k/ e) ■ log N), 
let A E be a matrix whose q rows are chosen uniformly and independently from the rows of M, 

multiplied by ^JN/q. Then, with probability 1 — the matrix A satisfies the restricted 

isometry property of order k with constant e. 
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